智能论文笔记

HuSpaCy: an industrial-strength Hungarian natural language processing toolkit

György Orosz , Zsolt Szántó , Péter Berkecz , Gergő Szabó , Richárd Farkas

分类：自然语言处理 | (统计)机器学习

2022-01-06

虽然有几种可用于匈牙利语的源语言处理管道，但它们都不满足当今NLP应用程序的要求。语言处理管道应由接近最先进的lemmatization，形态学分析，实体识别和单词嵌入。工业文本处理应用程序必须满足非功能性的软件质量要求，更重要的是，支持多种语言的框架越来越受青睐。本文介绍了哈普西，匈牙利匈牙利语言处理管道。呈现的工具为最重要的基本语言分析任务提供组件。它是开源，可在许可证下提供。我们的系统建立在Spacy的NLP组件之上，这意味着它快速，具有丰富的NLP应用程序和扩展生态系统，具有广泛的文档和众所周知的API。除了底层模型的概述外，我们还对共同的基准数据集呈现严格的评估。我们的实验证实，母鹿在所有子组织中具有高精度，同时保持资源有效的预测能力。

translated by 谷歌翻译

Hyperactive Learning (HAL) for Data-Driven Interatomic Potentials

Cas van der Oord , Matthias Sachs , Dávid Péter Kovács , Christoph Ortner , Gábor Csányi

分类： (统计)机器学习

2022-10-09

Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert user input. To accelerate this process, this work presents \text{\it hyperactive learning} (HAL), a framework for formulating an accelerated sampling algorithm specifically for the task of training database generation. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics) and add a biasing term that drives the system towards high uncertainty and thus to unseen training configurations. Building on this framework, general protocols for building training databases for alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configurations (32 atoms each) with fast evaluation times of <100 microsecond/atom/cpu-core. These potentials are demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL database is built using ACE, able to determine the density of a long polyethylene glycol (PEG) polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated PEG polymers with sizes ranging from 2 to 32.

translated by 谷歌翻译

Backflipping with Miniature Quadcopters by Gaussian Process Based Control and Planning

Péter Antal , Tamás Péni , Roland Tóth

分类：机器人

2022-09-29

该论文提出了两种控制方法，用于用微型四轮驱动器进行反弹式操纵。首先，对专门为反转设计设计的现有前馈控制策略进行了修订和改进。使用替代高斯工艺模型的贝叶斯优化通过在模拟环境中反复执行翻转操作来找到最佳运动原语序列。第二种方法基于闭环控制，它由两个主要步骤组成：首先，即使在模型不确定性的情况下，自适应控制器也旨在提供可靠的参考跟踪。控制器是通过通过测量数据调整的高斯过程来增强无人机的标称模型来构建的。其次，提出了一种有效的轨迹计划算法，该算法仅使用二次编程来设计可行的轨迹为反弹操作设计。在模拟和使用BitCraze Crazyflie 2.1四肢旋转器中对两种方法进行了分析。

translated by 谷歌翻译

Domain adaptation strategies for cancer-independent detection of lymph node metastases

Péter Bándi , Maschenka Balkenhol , Marcory van Dijk , Bram van Ginneken , Jeroen van der Laak , Geert Litjens

分类：计算机视觉 | 机器学习

2022-07-13

最近，大型高质量的公共数据集导致了卷积神经网络的发展，这些神经网络可以在专家病理学家水平上检测乳腺癌的淋巴结转移。许多癌症，无论起源地点如何，都可以转移到淋巴结。但是，收集和注释每种癌症类型的高量，高质量数据集都是具有挑战性的。在本文中，我们研究了如何在多任务设置中最有效地利用现有的高质量数据集，以实现紧密相关的任务。具体而言，我们将探索不同的训练和领域适应策略，包括预防灾难性遗忘，用于结肠和头颈癌症转移淋巴结中的灾难性遗忘。我们的结果表明，两项癌症转移检测任务的最新性能。此外，我们显示了从一种癌症类型到另一种癌症的反复适应以获得多任务转移检测网络的有效性。最后，我们表明，利用现有的高质量数据集可以显着提高新目标任务的性能，并且可以使用正则化有效地减轻灾难性遗忘。

translated by 谷歌翻译

Learning the parameters of a differential equation from its trajectory via the adjoint equation

Imre Fekete , András Molnár , Péter L. Simon

分类：机器学习

2022-06-17

本文有助于加强机器学习与微分方程理论之间的关系。在这种情况下，拟合参数的逆问题，而微分方程与某些测量值的初始条件构成了关键问题。本文探讨了一个可以用于构建损失函数家族的抽象，目的是将初始值问题解决方案拟合到一组离散或连续测量中。可以证明，伴随方程的扩展可以用来推导损失函数的梯度，作为机器学习中反向传播的连续类似物。提供了数值证据，表明在合理控制的情况下，获得的梯度可以在梯度下降中使用，以将初始值问题解决方案拟合到一组连续的嘈杂测量值中，以及一组离散的噪声测量值，这些测量值在不确定的情况下记录下来时代。

translated by 谷歌翻译

MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields

Ilyes Batatia , Dávid Péter Kovács , Gregor N. C. Simm , Christoph Ortner , Gábor Csányi

分类： (统计)机器学习 | 机器学习

2022-06-15

在计算化学和材料科学中，创建快速准确的力场是一项长期挑战。最近，已经证明，几个直径传递神经网络（MPNN）超过了使用其他方法在准确性方面构建的模型。但是，大多数MPNN的计算成本高和可伸缩性差。我们建议出现这些局限性，因为MPNN仅传递两体消息，从而导致层数与网络的表达性之间的直接关系。在这项工作中，我们介绍了MACE，这是一种使用更高的车身订单消息的新型MPNN模型。特别是，我们表明，使用四体消息将所需的消息传递迭代数减少到\ emph {两}，从而导致快速且高度可行的模型，达到或超过RMD17的最新准确性，3BPA和ACAC基准任务。我们还证明，使用高阶消息会导致学习曲线的陡峭程度改善。

translated by 谷歌翻译

Towards Robotic Laboratory Automation Plug & Play: Survey and Concept Proposal on Teaching-free Robot Integration with the LAPP Digital Twin

Ádám Wolf , Stefan Romeder-Finger , Károly Széll , Péter Galambos

分类：机器人

2022-05-17

The Laboratory Automation Plug & Play (LAPP) framework is an over-arching reference architecture concept for the integration of robots in life science laboratories. The plug & play nature lies in the fact that manual configuration is not required, including the teaching of the robots. In this paper a digital twin (DT) based concept is proposed that outlines the types of information that have to be provided for each relevant component of the system. In particular, for the devices interfacing with the robot, the robot positions have to be defined beforehand in a device-attached coordinate system (CS) by the vendor. This CS has to be detectable by the vision system of the robot by means of optical markers placed on the front side of the device. With that, the robot is capable of tending the machine by performing the pick-and-place type transportation of standard sample carriers. This basic use case is the primary scope of the LAPP-DT framework. The hardware scope is limited to simple benchtop and mobile manipulators with parallel grippers at this stage. This paper first provides an overview of relevant literature and state-of-the-art solutions, after which it outlines the framework on the conceptual level, followed by the specification of the relevant DT parameters for the robot, for the devices and for the facility. Finally, appropriate technologies and strategies are identified for the implementation.

translated by 谷歌翻译

Data-driven modelling of nonlinear dynamics by polytope projections and memory

Niklas Wulkow , Péter Koltai , Vikram Sunkara , Christof Schütte

分类： (统计)机器学习

2021-12-13

我们提出了一种从数据模拟动态系统的数值方法。我们使用最近引入的方法可扩展的概率近似（SPA）从欧几里德空间到凸多台的项目点，并表示在新的低维坐标中的系统的预计状态，表示其在多晶硅中的位置。然后，我们介绍特定的非线性变换，以构建多特渗透中动力学的模型，并转换回原始状态空间。为了克服投影到低维层的潜在信息损失，我们在局部延迟嵌入定理的意义上使用记忆。通过施工，我们的方法产生稳定的模型。我们说明了在各种示例上具有多个连接组件的甚至复制混沌动力学和吸引子的方法的能力。

translated by 谷歌翻译

Equivariant Quantum Graph Circuits

Péter Mernyei , Konstantinos Meichanetzidis , İsmail İlkan Ceylan

分类：机器学习

2021-12-10

我们研究了图形表示学习的量子电路，并提出了等级的量子图电路（EQGCS），作为一类参数化量子电路，具有强大的关系感应偏压，用于学习图形结构数据。概念上，EQGCS作为量子图表表示学习的统一框架，允许我们定义几个有趣的子类，其中包含了现有的提案。就代表性权力而言，我们证明了感兴趣的子类是界限图域中的函数的普遍近似器，并提供实验证据。我们对量子图机学习方法的理论透视开启了许多方向以进行进一步的工作，可能导致具有超出古典方法的能力的模型。

translated by 谷歌翻译

Towards Robotic Laboratory Automation Plug & Play: The "LAPP" Framework

Ádám Wolf , David Wolton , Josef Trapl , Julien Janda , Stefan Romeder-Finger , Thomas Gatternig , Jean-Baptiste Farcet , Péter Galambos , Károly Széll

分类：机器人

2021-06-18

增加制药实验室和生产设施的自动化水平起着至关重要的作用。然而，这一领域的特殊要求使其挑战适应其他行业中存在的尖端技术。本文概述了相关方法以及如何在制药行业中使用，特别是在发展实验室中。最近的进步包括能够处理能够处理复杂任务的灵活移动机械手。然而，由于接口的多样性，将来自许多不同供应商的设备集成到端到端的自动化系统中是复杂的。因此，在本文中考虑了各种标准化方法，提出了一种概念来进一步服用一步。该概念使具有视觉系统的移动操纵器能够“学习”每个设备的姿势，并利用来自通用云数据库的条形码 - 获取接口信息。该信息包括控制和通信协议定义以及操作设备所需的机器人操作的表示。为了定义与设备相关的动作，设备必须具有 - 除了条形码 - 作为标准的基准标记。在随访论文中的适当研究活动之后，将详细阐述该概念。

translated by 谷歌翻译